Iterative Treebank Refinement
نویسندگان
چکیده
Treebanks are a valuable resource for the training of parsers that perform automatic annotation of unseen data. It has been shown that changes in the representation of linguistic annotation have an impact on the performance of a certain annotation task. We focus on the task of Topological Field Parsing for German using Probabilistic Context-Free Grammars in the present research. We investigate an iterative algorithm for tuning the label set of a given treebank to this task and show that the number of parses proposed by a context-free grammar is reduced considerably in addition to an increase in labeled precision and recall for the annotation of node labels. We also show that the optimal refinement can be achieved with a relatively small number of changes to the treebank.
منابع مشابه
A Semi-Automatic, Iterative Method for Creating a Domain-Specific Treebank
In this paper we present the development process of NLP-QT, a question treebank that will be used for data-driven parsing in the context of a domain-specific QA system for querying NLP resource metadata. We motivate the need to build NLP-QT as a resource in its own right, by comparing the Penn Treebank-style annotation scheme used for QuestionBank (Judge et al., 2006) with the modified NP annot...
متن کاملTreebank refinement: optimising representations of syntactic analyses for probabilistic context-free parsing
متن کامل
Iterative Transformation of Annotation Guidelines for Constituency Parsing
This paper presents an effective algorithm of annotation adaptation for constituency treebanks, which transforms a treebank from one annotation guideline to another with an iterative optimization procedure, thus to build a much larger treebank to train an enhanced parser without increasing model complexity. Experiments show that the transformed Tsinghua Chinese Treebank as additional training d...
متن کاملCore Arguments in Universal Dependencies
We investigate how core arguments are coded in case-marking Indo-European languages. Core arguments are a central concept in Universal Dependencies, yet it is sometimes difficult to match against terminologies traditionally used for individual languages. We review the methodology described in (Andrews, 2007), and include brief definitions of some basic terms. Statistics from 26 UD treebanks sho...
متن کامل